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ABSTRACT 

Predictive models are fundamental to engineering reliable 
software systems. However, designing conservative, com- 
putable approximations for the behavior of programs (static 
analyses) remains a difficult and error-prone process for mod- 
ern high-level programming languages. What analysis de- 
signers need is a principled method for navigating the gap 
between semantics and analytic models: analysis designers 
need a method that tames the interaction of complex lan- 
guages features such as higher-order functions, recursion, 
exceptions, continuations, objects and dynamic allocation. 

We contribute a systematic approach to program analysis 
that yields novel and transparently sound static analyses. 
Our approach relies on existing derivational techniques to 
transform high-level language semantics into low-level de- 
terministic state-transition systems (with potentially infinite 
state spaces). We then perform a series of simple machine 
refactorings to obtain a sound, computable approximation, 
which takes the form of a non-deterministic state-transition 
systems with finite state spaces. The approach scales up 
uniformly to enable program analysis of realistic language 
features, including higher-order functions, tail calls, condi- 
tionals, side effects, exceptions, first-class continuations, and 
even garbage collection. 

1. INTRODUCTION 

Software engineering, compiler optimizations, program par- 
allelization, system verification, and security assurance de- 
pend on program analysis, a ubiquitous and central theme 
of programming language research. At the same time, the 
production of modern software systems employs expressive, 
higher-order languages such as Java, JavaScript, C#, Python, 
Ruby, etc., implying a growing need for fast, precise, and 
scalable higher-order program analyses. 
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Program analysis aims to soundly predict properties of 
programs before being run. (Sound in program analysis 
means "conservative approximation": if a sound analysis 
says a program must not exhibit behavior, then that pro- 
gram will not exhibit that behavior; but if a sound analy- 
sis says a program may exhibit a behavior, then it may or 
may not exhibit that behavior.) For over thirty years, the 
research community has expended significant effort design- 
ing effective analyses for higher-order programs [T3]. Past 
approaches have focused on connecting high-level language 
semantics such as structured operational semantics, deno- 
tational semantics, or reduction semantics to equally high- 
level but dissimilar analytic models. These models are too 
often far removed from their programming language coun- 
terparts and take the form of constraint languages speci- 
fied as relations on sets of program fragments |25l 1181 112] . 
These approaches require significant ingenuity in their de- 
sign and involve complex constructions and correctness ar- 
guments, making it difficult to establish soundness, design 
algorithms, or grow the language under analysis. Moreover, 
such analytic models, which focus on "value flow", i.e., deter- 
mining which syntactic values may show up at which pro- 
gram sites at run-time, have a limited capacity to reason 
about many low-level intensional properties such as mem- 
ory management, stack behavior, or trace-based properties 
of computation. Consequently, higher-order program anal- 
ysis has had limited impact on large-scale systems, despite 
the apparent potential for program analysis to aid in the 
construction of reliable and efficient software. 

In this paper, we describe a systematic approach to pro- 
gram analysis that overcomes many of these limitations by 
providing a straightforward derivation process, lowering ver- 
ification costs and accommodating sophisticated language 
features and program properties. 

Our approach relies on leveraging existing techniques to 
transform high-level language semantics into abstract ma- 
chines — low-level deterministic state-transition systems with 
potentially infinite state spaces. Abstract machines [11], and 
the paths from semantics to machines |20l [S] |3 , have a long 
history in the research on programming languages. 

From an abstract machine, which represents the idealized 
core of a realistic run-time system, we perform a series of ba- 
sic machine refactorings to obtain a non- deterministic state- 
transition system with a finite state space. The refactorings 
are simple: (1) variable bindings and the control stack are 
redirected through the machine's store and (2) the store is 
bounded to a finite size. Due to finiteness, store updates 
must become merges, leading to the possibility of multi- 



pie values residing in a single store location. This in turn 
requires store look-ups be replaced by a non-deterministic 
choice among the multiple values at a given location. The 
derived machine computes a sound approximation of the 
original machine, and thus forms an abstract interpretation 
of the machine and the high-level semantics. 

The approach scales up uniformly to enable program anal- 
ysis of realistic language features, including higher-order 
functions, tail calls, conditionals, side effects, exceptions, 
first-class continuations, and even garbage collection. Thus, 
we are able to refashion semantic techniques used to model 
language features into abstract interpretation techniques for 
reasoning about the behavior of those very same features. 

Background and notation: We present a brief introduction 
to reduction semantics and abstract machines. For back- 
ground and a more extensive introduction to the concepts, 
terminology, and notation employed in this paper, we refer 
the reader to Semantics Engineering with PLT Redex (3- 

2. FROM SEMANTICS TO MACHINES AND 
MACHINES TO ANALYSES 

In this section, we demonstrate our systematic approach 
to analysis by stepping through a derivation from the high- 
level semantics of a prototypical higher-order programming 
language to a low-level abstract machine, and from the ab- 
stract machine to a sound and computable analytic model 
that predicts intensional properties of that machine. As a 
prototypical language, we choose the call-by- value A-calculus 
[19j . a core computational model for both functional and 
object-oriented languages. We choose to model program be- 
havior with a simple operational model given in the form 
of a reduction semantics. Despite this simplicity, reduction 
semantics scale to fuU-fiedged programming languages [22| . 
although the choice is somewhat arbitrary since it is known 
how to construct abstract machines from a number of se- 
mantic paradigms [5]- In subsequent sections, we demon- 
strate the approach handles richer language features such as 
control, state, and garbage collection, and we have success- 
fully employed the same method to statically reason about 
language features such as laziness, exceptions, and stack- 
inspection, and programming languages such as Java and 
JavaScript. In all cases, analyses are derived following the 
systematic approach presented here. 

2.1 Reduction semantics 

To begin, consider the following language of expressions: 

e £ Exp = X I (ee) | (Aa;.e) 

X £ Var an infinite set of identifiers. 

The syntax of expressions includes variables, applications, 
and functions. Values v, for the purposes of this language, 
include only function terms, (Ax.e) . We say x is the for- 
mal parameter of the function (Ax.e), and e is its body. A 
program is a closed expression, i.e., an expression in which 
every variable occurs within some function that binds that 
variable as its formal parameter. Call-by-value reduction is 
characterized by the relation v: 

(.(}iX.e)v) V [v/x]e, 

which states that a function applied to a value reduces to 
the body of the function with every occurrence of the formal 
parameter replaced by the value. The expression on the left- 



hand side is a known as a redex and the right-hand side is 
its contractum. 

Reduction can occur within a context of an evaluation 
context, defined by the following grammar: 

E=[] \ {Ee) I {vE). 

An evaluation context can be thought of as an expression 
with a single "hole" in it, which is where a redex may be re- 
duced. It is straightforward to observe that for all programs, 
either the program is a value, or it decomposes uniquely 
into an evaluation context and redex, written E[{(.'Kx.e)v)]. 
Thus the grammar as given specifies a deterministic reduc- 
tion strategy, which is formalized as a standard reduction 
relation on programs: 

E[e] 1— >v E[e'], if e v e'. 

The evaluation of a program is defined by a partial function 
relating programs to values [7, page 67]: 

eval{e) — v if e i — »v v, for some v, 

where i — »-v denotes the reflexive, transitive closure of the 
standard reduction relation. 

We have now established the high-level semantic basis for 
our prototypical language. The semantics is in the form of 
an evaluation function defined by the reflexive, transitive 
closure of the standard reduction relation. However, the 
evaluation function as given does not shed much light on 
a realistic implementation. At each step, the program is 
traversed according to the grammar of evaluation contexts 
until a redex is found. When found, the redex is reduced 
and the contractum is plugged back into the context. The 
process is then repeated, again traversing from the beginning 
of the program. Abstract machines offer an extensionally 
equivalent but more realistic model of evaluation that short- 
cuts the plugging of a contractum back into a context and 
the subsequent decomposition [6]. 

2.2 CEK machine 

The CEK machine [20l Interpreter III] [3 page 100] is a 
state transition system that efficiently performs evaluation 
of a program. There are two key ideas in its construction, 
which can be carried out systematically [2]. The first is that 
substitution, which is not a viable implementation strategy, 
is instead represented in a delayed, explicit manner as an en- 
vironment structure. So a substitution [v/x]e is represented 
by e and an environment that maps x to v. Since e and v 
may have previous substitutions applied, this will likewise 
be represented with environments. So in general, if p is the 
environment of e and p' is the environment of v, then we 
represent [v/x]e by e in the environment p extended with a 
mapping of x to {v,p'), written p[x i-^ {v,p')]. The pairing 
of a value and an environment is known as a closure [11] . 

The second key idea is that evaluation contexts are con- 
structed inside-out and represent continuations: 

1. [ ] is represented by mt; 

2. E[([ ]e)] is represented by ar(e', p, k) where p closes e' 
to represent e and k represents E; and 

3. -E[(u[ ])] is represented by fn(u',p, k) where p closes 
v' to represent v and k represents E. 

In this way, evaluation contexts form a program stack: mt 
is the empty stack, and ar and fn are frames. 
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Figure 1: CEK machine. 



States of the CEK machine are triples consisting of an 
expression, an environment that closes the control string, 
and a continuation: 

? G E = Exp X Env x Cont 

V G Val — iXx.e) 

p G Env — Var — ^-fln Val X Env 

K G Cont — mt I ar(e,p, k) [ fn{v, p, k). 

The transition function for the CEK machine is defined in 
Figure [1] The initial machine state for a program e is given 
by the tnj q^j^ function: 

^njcEKie) = (e,e),int). 

Evaluation is defined by the refiexive, transitive closure of 
the machine transition relation and a "real" function [191 
page 129] that maps closures to the term represented: 

evalcEK{e) = real{v,p), where inj Q^j^{e) i — »-v mt), 

which is equivalent to the eval function of Section [2. II 

Lemma 1 (CEK Correctness [7]) evalcEK ~ eval. 

We have now established a correct low-level evaluator for 
our prototypical language that is extensionally equivalent to 
the high-level reduction semantics. However, program anal- 
ysis is not just concerned with the result of a computation, 
but also with how it was produced, i.e., analysis should pre- 
dict intensional properties of the machine as it runs a pro- 
gram. We therefore adopt a reachable states semantics that 
relates a program to the set of all its intermediate steps: 

CEK{e) = I injc-Bjf (e) i — »cek ?}. 

Membership in the set of reachable states is straightfor- 
wardly undecidable. The goal of analysis, then, is to con- 
struct an abstract interpretation [4., that is a sound and com- 
putable approximation of the CEK function. 

We can do this by constructing a machine that is simi- 
lar in structure to the CEK machine: it is defined by an 
abstract state transition relation, i — ^cSfj, which operates 

over abstract states, E, that approximate states of the CEK 
machine. Abstract evaluation is then defined as: 

OEKie) = {? I tnj^ie) ^cEk 

1. Soundness is achieved by showing transitions preserves 
approximation, so that if ? i — >cek ?' and ? approx- 
imates then there exists an abstract state <f' such 
that <; I — ^■(jgjf ?' and ?' approximates 

2. Decidability is achieved by constructing the approxi- 
mation in such a way that the state-space of the ab- 
stracted machine is finite, which guarantees that for 
any program e, the set CEK{e) is finite. 



An attempt at approximation: A simple approach to 
abstracting the machine's state space is to apply a struc- 
tural abstraction, which lifts approximation across the struc- 
ture of a machine state, i.e., expressions, environments, and 
continuations. The problem with the structural abstraction 
approach for the CEK machine is that both environments 
and continuations are recursive structures. As a result, the 
abstraction yields objects in an abstract state-space with 
recursive structure, implying the space is infinite. 

Focusing on recursive structure as the source of the prob- 
lem, our course of action is to add a level of indirection, forc- 
ing recursive structure to pass through explicitly allocated 
addresses. Doing so unhinges the recursion in the machine's 
data structures, enabling structural abstraction via a single 
point of approximation: the store. 

The next section covers the first of the two steps for refac- 
toring the CEK machine into its computable approximation: 
a store component is introduced to machine states and vari- 
able bindings and continuations are redirected through the 
store. This step introduces no approximation and the con- 
structed machine operates in lock-step with the CEK ma- 
chine. However, the machine is amenable to a direct struc- 
tural abstraction. 

2.3 CESK* machine 

The states of the CESK* machine extend those of the 
CEK machine to include a store, which provides a level of 
indirection for variable bindings and continuations to pass 
through. The store is a finite map from addresses to storable 
values, which includes closures and continuations, and envi- 
ronments are changed to map variables to addresses. When 
a variable's value is looked-up by the machine, it is now 
accomplished by using the environment to look up the vari- 
able's address, which is then used to look up the value. To 
bind a variable to a value, a fresh location in the store is allo- 
cated and mapped to the value; the environment is extended 
to map the variable to that address. 

To untie the recursive structure associated with contin- 
uations, we likewise add a level of indirection through the 
store and replace the continuation component of the machine 
with a pointer to a continuation in the store. We term the 
resulting machine the CESK* (control, environment, store, 
continuation pointer) machine. 

C G E = Exp X Env x Store x Addr 

s G Storable = Val X Env + Cont 

ft G Cont = mt I ar(e, p, a) \ fn{v, p, a). 

The transition function for the CESK* machine is defined 
in Figure [2] The initial state for a program is given by 
the inj CESK- function, which combines the expression with 
the empty environment and a store with a single pointer to 
the empty continuation, whose address serves as the initial 
continuation pointer: 

%'cisSK*(e) = (c,0, [ao i~> mt],ao). 

An evaluation function based on this machine is defined 
following the template of the CEK evaluation given in Sec- 
tion [22] 

evalcESK* (e) = real{v, p,o), where 

"^^j CESK- i^) ' — »CESK- {v,p,a,ao}, 

where the real function is suitably extended to follow the 
environment's indirection through the store. 



? I — >CESK* where k = o"(a), b ^ dom{a) 



q I — >CESK* where k = a{a),b = aUoc{<;),u = tick{<;) 



{x,p,a,a) {v,p',a,a) where (v, p') = a{p{x)) 

((eoei), p,cr, a) {eo,p,a[b ar(ei, p, a)], 6) 
(u, p, a, a) 

if K = ar(e, p' , c) (e, p', o-[6 i-^ fn(u, p, c)], 6) 

if K = fn( (Ax. e),p',c) (e,p'[a; i-)- fo],cr[fo i->- {v,p)],c) 

Figure 2: CESK* machine. 

We also define the set of reachable machine states: 

CESK'{e) = {? I injcESK-ie) i — »cesk' ?}. 

Observe that for any program, the CEK and CESK* ma- 
chines operate in lock-step: each machine transitions, by the 
corresponding rule, if and only if the other machine transi- 
tions. 

Lemma 2 CESK*{e) ~ CEK{e) 

The above lemma implies correctness of the machine. 

Lemma 3 (CESK* Correctness) evalcESK* ~ eval. 

Addresses, abstraction and allocation: The CESK* 
machine, as defined in Figure^ nondeterministically chooses 
addresses when it allocates a location in the store, but be- 
cause machines are identified up to consistent renaming of 
addresses, the transition system remains deterministic. 

Looking ahead, an easy way to bound the state-space of 
this machine is to bound the set of addresses. But once the 
store is finite, locations may need to be reused and when 
multiple values are to reside in the same location; the store 
will have to soundly approximate this by joining the values. 

In our concrete machine, all that matters about an alloca- 
tion strategy is that it picks an unused address. In the ab- 
stracted machine however, the strategy will all but certainly 
have to re-use previously allocated addresses. The abstract 
allocation strategy is therefore crucial to the design of the 
analysis — it indicates when finite resources should be doled 
out and decides when information should deliberately be 
lost in the service of computing within bounded resources. 
In essence, the allocation strategy is the heart of an analysis. 

For this reason, concrete allocation deserves a bit more 
attention in the machine. An old idea in program analy- 
sis is that dynamically allocated storage can be represented 
by the state of the computation at allocation time [101 1131 
Section 1.2.2]. That is, allocation strategies can be based 
on a (representation) of the machine history. Since machine 
histories are always fresh, we we call them time- stamps. 

A common choice for a time-stamp, popularized by Shiv- 
ers [3T], is to represent the history of the computation as 
contours, finite strings encoding the calling context. We 
present a concrete machine that uses a general time-stamp 
approach and is parameterized by a choice of tick and alloc 
functions. 

2.4 Time-stamped CESK* machine 

The machine states of the time-stamped CESK' machine 
include a time component, which is intentionally left un- 
specified: 

t, M e Time 
? £ E = Exp X Env x Store x Addr x Time. 



(x, p, o", a, t) (v, p', (T, a. It) where (v, p') — a[p[x)) 

{(eoCi) , p,a,a,t) {eo,p,a[b h-> ar(ei, p, a)], &, u) 
{v,p,a,a,t) 

if K = ar(e, p, c) (e, p, a[b t-j. fn{v, p, c)], 6, u) 

if K = fn((Aa;.e), p',c) {e,p'[x b],a[b i-^ {v,p)],c,u) 

Figure 3: Time-stamped CESK* machine. 

The machine is parameterized by the functions: 

tick : E — >■ Time alloc : S — !> Addr. 

The tick function returns the next time; the alloc function 
allocates a fresh address for a binding or continuation. We 
require of tick and alloc that for all t and t \Z tick{<;) and 
alloc{<;) ^ a where ? = (_, _, a, _, . 

The time-stamped CESK* machine is defined in FigureO 
Note that occurrences of ? on the right-hand side of this defi- 
nition are implicitly bound to the state occurring on the left- 
hand side. The evaluation function evalcESK* and reachable 
states CESK* are defined following the same outline as be- 
fore and omitted for space. The initial machine state is 
defined as: 

^njcESK* (e) = (e, 0, [ao ^ mt], ao, to)- 

Satisfying definitions for the parameters are: 

Time = Addr = Z 
ao = to — tick{_, f) = t + 1 alloc{_, _, _, _, t) = t. 

Under these definitions, the time-stamped CESK* machine 
operates in lock-step with the CESK* machine, and there- 
fore with the CEK machine, implying its correctness. 

Lemma 4 CESKl{e) ~ CESK*{e). 

The time-stamped CESK* machine forms the basis of our 
abstracted machine in the following section. 

2.5 Abstract time-stamped CESK* machine 

As alluded to earlier, with the time-stamped CESK* ma- 
chine, we now have a machine ready for direct abstract inter- 
pretation via a single point of approximation: the store. Our 
goal is a machine that resembles the time-stamped CESK* 
machine, but operates over a finite state-space and it is al- 
lowed to be nondeterministic. Once the state-space is finite, 
the transitive closure of the transition relation becomes com- 
putable, and this transitive closure constitutes a static anal- 
ysis. Buried in a path through the transitive closure is a 
possibly infinite traversal that corresponds to the concrete 
execution of the program. 

The abstracted variant of the time-stamped CESK* ma- 
chine comes from bounding the address space of the store 
and the number of times available. By bounding the address 
space, the whole state-space becomes finite. (Syntactic sets 
like Exp are infinite, but finite for any given program.) For 
the purposes of soundness, an entry in the store may be 
forced to hold several values simultaneously: 

(T e Store = Addr -^nu V {Star able). 
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Figure 4: Abstract time-stamped CESK* machine. 
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Hence, stores now map an address to a set of storable values 
rather than a single value. These collections of values model 
approximation in the analysis. If a location in the store is re- 
used, the new value is joined with the current set of values. 
When a location is dereferenced, the analysis must consider 
any of the values in the set as a result of the dereference. 

The abstract time-stamped CESK* machine is defined in 
Figure H) The non-deterministic abstract transition rela- 
tion changes little compared with the concrete machine. We 
oidy have to modify it to account for the possibility that 
multiple storable values, which includes continuations, may 
reside together in the store. We handle this situation by let- 
ting the machine non-deterministically choose a particular 
value from the set at a given store location. 

The analysis is parameterized by abstract variants of the 
functions that parameterized the concrete version: 

tick : E X Cont — >■ Time, alloc : E x Cont — !> Addr. 

In the concrete, these parameters determine allocation and 
stack behavior. In the abstract, they are the arbiters of 
precision: they determine when an address gets re-allocated, 
how many addresses get allocated, and which values have to 
share addresses. 

Recall that in the concrete semantics, these functions con- 
sume states — not states and continuations as they do here. 
This is because in the concrete, a state alone suffices since 
the state determines the continuation. But in the abstract, a 
continuation pointer within a state may denote a multitude 
of continuations; however the transition relation is defined 
with respect to the choice of a particular one. We thus pair 
states with continuations to encode the choice. 

The abstract semantics is given by the reachable states: 



CESK't{e) = {? I a{^nJ^ 



•(e)) 



Soundness and decidability: We have endeavored to 
evolve the abstract machine gradually so that its fidelity 
in soundly simulating the original CEK machine is both 
intuitive and obvious. To formally establish soundness of 
the abstract time-stamped CESK* machine, we use an ab- 
straction function, defined in Figure [S] from the state-space 
of the concrete time-stamped machine into the abstracted 
state-space. 

The abstraction map over times and addresses is defined 
so that the parameters alloc and tick are sound simulations 
of the parameters alloc and tick, respectively. We also define 
the partial order (C) on the abstract state-space as the nat- 
ural point- wise, element- wise, component-wise and member- 
wise lifting, wherein the partial orders on the sets Exp and 
Addr are flat. Then, we can prove that abstract machine's 



Figure 5: Abstraction map, a : Ec 



transition relation simulates the concrete machine's transi- 
tion relation. 

Theorem 1 (Soundness) 

If <; I — >CEK ?' and q(<;) C then there exists an abstract 
state q' , such that <; i — o.'n-d «(?') C (f'. 

Proof. By Lemmas [3] and (4] it suffices to prove sound- 
ness with respect to i — i-cESK*- Assume ? i — >cesk* ?' and 
a{<;) C <;. Because ? transitioned, exactly one of the rules 
from the definition of (i — >cesk^) applies. We split by cases 
on these rules. The rule for the second case is deterministic 
and follows by calculation. For the remaining (nondeter- 
ministic) cases, we must show an abstract state exists such 
that the simulation is preserved. By examining the rules for 
these cases, we see that all three hinge on the abstract store 
in <; soundly approximating the concrete store in which 
follows from the assumption that a(<;) CI <f. □ 

Theorem 2 (Decidability) 

Membership of q in CESKt{e) is decidable. 

Proof. The state-space of the machine is non-recursive 
with finite sets at the leaves on the assumption that ad- 
dresses are finite. Hence reachability is decidable since the 
abstract state-space is finite. □ 

3. ABSTRACT STATE AND CONTROL 

We have shown that store-allocated continuations make 
abstract interpretation of the CESK* machine straightfor- 
ward. In this section, we want to show that the tight corre- 
spondence between concrete and abstract persists after the 
addition of language features such as conditionals, side ef- 
fects, and first-class continuations. We tackle each feature, 
and present the additional machinery required to handle 
each one. In most cases, the path from a canonical con- 
crete machine to pointer-refined abstraction of the machine 
is so simple we only show the abstracted system. In doing 
so, we are arguing that this abstract machine-oriented ap- 
proach to abstract interpretation represents a flexible and 
viable framework for building program analyses. 

To handle conditionals, we extend the language with a 
new syntactic form, (if e e e), and introduce a base value 
#f , representing false. Conditional expressions induce a new 
continuation form: if(eo, e'l, p, a), which represents the eval- 
uation context E[(.i± [ ] eo ei)] where p closes e'o to repre- 
sent eo, p closes e'l to represent ei, and a is the address of 
the representation of E. 
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Figure 6: Abstract extended CESK* machine. 



Side effects are fully amenable to our approach; we in- 
troduce Scheme's set ! for mutating variables using the 
(set! X e) syntax. The set! form evaluates its subex- 
pression e and assigns the value to the variable x. Although 
set ! expressions are evaluated for effect, we follow Felleisen 
et al. and specify set ! expressions evaluate to the value 
of X before it was mutated [71 page 166]. The evaluation 
context i5[(set! x [ ])] is represented by set(ao,ai), where 
flo is the address of a;'s value and ai is the address of the 
representation of E. 

First-class control is introduced by adding a new base 
value callcc which reifies the continuation as a new kind 
of applicable value. Denoted values are extended to in- 
clude representations of continuations. Since continuations 
are store-allocated, we choose to represent them by address. 
When an address is applied, it represents the application of 
a continuation (reified via callcc) to a value. The contin- 
uation at that point is discarded and the applied address is 
installed as the continuation. 

The resulting grammar is: 

e G Exp = . . . I (if e e e) I (set! x e) 
K G Cont = . . . I if (e, e, p, a) \ set(a, a) 
V e Val = . . . I #f I callcc | a. 

We show only the abstract transitions, which result from 
store-allocating continuations, time-stamping, and abstract- 
ing the concrete transitions for conditionals, mutation, and 
control. The first three machine transitions deal with con- 
ditionals; here we follow the Scheme tradition of considering 
all non-false values as true. The fourth and fifth transitions 
deal with mutation. 

The remaining three transitions deal with first-class con- 
trol. In the first of these, callcc is being applied to a closure 
value V. The value v is then "called with the current continu- 
ation", i.e., V is applied to a value that represents the contin- 
uation at this point. In the second, callcc is being applied 
to a continuation (address). When this value is applied to 
the reified continuation, it aborts the current computation, 



installs itself as the current continuation, and puts the reified 
continuation "in the hole". Finally, in the third, a continua- 
tion is being applied; c gets thrown away, and v gets plugged 
into the continuation b. In all cases, these transitions result 
from pointer-refinement, time-stamping, and abstraction of 
the usual machine transitions. 

4. ABSTRACT GARBAGE COLLECTION 

Garbage collection determines when a store location has 
become unreachable and can be re-allocated. This is signif- 
icant in the abstract semantics because an address may be 
allocated to multiple values due to finiteness of the address 
space. Without garbage collection, the values allocated to 
this common address must be joined, introducing impreci- 
sion in the analysis (and inducing further, perhaps spuri- 
ous, computation). By incorporating garbage collection in 
the abstract semantics, the location may be proved to be 
unreachable and safely overwritten rather than joined, in 
which case no imprecision is introduced. 

Like the rest of the features addressed in this paper, we 
can incorporate abstract garbage collection into our static 
analyzers by a straightforward pointer-refinement of text- 
book accounts of concrete garbage collection, followed by a 
finite store abstraction. 

Concrete garbage collection is defined in terms of a GC 
machine that computes the reachable addresses in a store [T] 
page 172]: 

{g, B, a) ^GC {{G U LL^aia)) \ {B U {a})),B U {a}, a) 
iiaeQ. 

This machine iterates over a set of reachable but unvisited 
"grey" locations Q. On each iteration, an element is removed 
and added to the set of reachable and visited "black" loca- 
tions B. Any newly reachable and unvisited locations, as de- 
termined by the "live locations" function LLa, are added to 
the grey set. When there are no grey locations, the black set 
contains all reachable locations. Everything else is garbage. 

The live locations function computes a set of locations 
which may be used in the store. Its definition varies based 
on the machine being garbage collected, but the definition 
appropriate for the CESK* machine of Section [2.31 is: 

LLa{e,p) = LL^{p\{v(e)) 
LLa{p) = rng{p) 

LLa(mt) = 
LLa{fri{v, p,a)) = {a} U LL^{v,p) U LLa{(j{a)) 
LLa{a.r{e, p,a)) = {a} U LL^{e,p) U LL^{a{a)). 

We write p|fv(e) to mean p restricted to the domain of free 
variables in e. We assume the least-fixed-point solution in 
the calculation of the function LL in cases where it recurs 
on itself. 

The pointer-refinement requires parameterizing the LL 
function with a store used to resolve pointers to continu- 
ations. A nice consequence of this parameterization is that 
we can re-use LL for abstract garbage collection by supplying 
it an abstract store for the parameter. Doing so only neces- 
sitates extending LL to the case of sets of storable values: 

LL^iS) = U LL„{s) 
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Figure 7: GC transition for the CESK* machine. 



The CESK* machine incorporates garbage collection by 
a transition rule that invokes the GC machine as a sub- 
routine to remove garbage from the store (Figure [7]). The 
garbage collection transition introduces non-determinism to 
the CESK* machine because it applies to any machine state 
and thus overlaps with the existing transition rules. The 
non-determinism is interpreted as leaving the choice of when 
to collect garbage up to the machine. 

The abstract CESK* incorporates garbage collection by 
the concrete garbage collection transition, t.e., we re-use the 
definition in Figure [7] with an abstract store, a, in place of 
the concrete one. Consequently, it is easy to verify abstract 
garbage collection approximates its concrete counterpart. 

The CESK* machine may collect garbage at any point 
in the computation, thus an abstract interpretation must 
soundly approximate all possible choices of when to trigger 
a collection, which the abstract CESK* machine does cor- 
rectly. This may be a useful analysis of garbage collection, 
however it fails to be a useful analysis with garbage collec- 
tion: for soundness, the abstracted machine must consider 
the case in which garbage is never collected, implying no 
storage is reclaimed to improve precision. 

However, we can leverage abstract garbage collection to 
reduce the state-space explored during analysis and to im- 
prove precision and analysis time. This is achieved (again) 
by considering properties of the concrete machine, which 
abstract directly; in this case, we want the concrete ma- 
chine to deterministically collect garbage. Determinism of 
the CESK* machine is restored by defining the transition 
relation as a non-GC transition (Figure ^ followed by the 
GC transition (Figure [7|. This state-space of this concrete 
machine is "garbage free" and consequently the state-space 
of the abstracted machine is "abstract garbage free." 

In the concrete semantics, a nice consequence of this prop- 
erty is that although continuations are allocated in the store, 
they are deallocated as soon as they become unreachable, 
which corresponds to when they would be popped from the 
stack in a non-pointer-refined machine. Thus the concrete 
machine really manages continuations like a stack. 

Similarly, in the abstract semantics, continuations are deal- 
located as soon as they become unreachable, which often 
corresponds to when they would be popped. We say often, 
because due to the finiteness of the store, this correspon- 
dence cannot always hold. However, this approach gives a 
good finite approximation to infinitary stack analyses that 
can always match calls and returns. 

5. RELATED WORK 

The study of abstract machines for the A-calculus began 
with Landin's SECD machine [11], the systematic construc- 
tion of machines from semantics with Reynolds's definitional 
interpreters [20], the theory of abstract interpretation with 
the seminal work of Cousot and Cousot W, and static anal- 
ysis of the A-calculus with Jones's coupling of abstract ma- 



chines and abstract interpretation [9]. All have been ac- 
tive areas of research since their inception, but only recently 
have well known abstract machines been connected with ab- 
stract interpretation by Midtgaard and Jensen [141 115) . We 
strengthen the connection by demonstrating a general tech- 
nique for abstracting abstract machines. 

The approximation of abstract machine states for the anal- 
ysis of higher-order languages goes back to Jones [9], who 
argued abstractions of regular tree automata could solve 
the problem of recursive structure in environments. We re- 
invoked that wisdom to eliminate the recursive structure of 
continuations by allocating them in the store. 

Midtgaard and Jensen present a OCFA for a CPS lan- 
guage [14]. The approach is based on Cousot-style calcu- 
lational abstract interpretation [3], applied to a functional 
language. Like the present work, Midtgaard and Jensen 
start with a known abstract machine for the concrete se- 
mantics, the CE machine of Flanagan, et al. [^, and employ 
a reachable-states model. They then compose well-known 
Galois connections to reveal a OCFA with reachability in 
the style of Ayers [1]. The CE machine is not sufficient to 
interpret direct-style programs, so the analysis is specialized 
to programs in continuation-passing style. 

Although our approach is not calculational like Midtgaard 
and Jensen's, it continues in their vein by applying abstract 
interpretation to well known machines, extending the ap- 
plication to direct-style machines to obtain a parameterized 
family of analyses that accounts for polyvariance. 

Static analyzers typically hemorrhage precision in the pres- 
ence of exceptions and first-class continuations: they jump 
to the top of the lattice of approximation when these features 
are encountered. Conversion to continuation- and exception- 
passing style can handle these features without forcing a 
dramatic ascent of the lattice of approximation [21]. The 
cost of this conversion, however, is lost knowledge — both 
approaches obscure static knowledge of stack structure, by 
desugaring it into syntax. 

Might and Shivers introduced the idea of using abstract 
garbage collection to improve precision and efficiency in flow 
analysis [16] . They develop a garbage collecting abstract 
machine for a CPS language and prove it correct. We ex- 
tend abstract garbage collection to direct-style languages in- 
terpreted on the CESK machine. 

6. CONCLUSIONS AND PERSPECTIVE 

We have demonstrated a derivational approach to pro- 
gram analysis that yields novel abstract interpretations of 
languages with higher-order functions, control, state, and 
garbage collection. These abstract interpreters are obtained 
by a straightforward pointer refinement and structural ab- 
straction that bounds the address space, making the ab- 
stract semantics safe and computable. The technique allows 
concrete implementation technology, such as garbage col- 
lection, to be imported straightforwardly into that of static 
analysis, bearing immediate benefits. More generally, an ab- 
stract machine based approach to analysis shifts the focus 
of engineering efforts from the design of complex analytic 
models such as involved constraint languages back to the 
design of programming languages and machines, from which 
analysis can be derived. Finally, our approach uniformly 
scales up to richer language features such as laziness, stack- 
inspection, exceptions, and object-orientation. We speculate 
that store-allocating bindings and continuations is sufficient 



for a straightforward abstraction of most existing machines. 

Looking forward, a semantics-based approach opens new 
possibilities for design. Context-sensitive analysis can have 
daunting complexity [24], which we have made efforts to 
tame 17;, but modular program analysis is crucial to over- 
come the significant cost of precise abstract interpretation. 
Modularity can be achieved without needing to design clever 
approximations, but rather by designing modular seman- 
tics from which modular analyses follow systematically [23) . 
Likewise, push-down analyses offer infinite state-space ab- 
stractions with perfect call-return matching while retaining 
decidability. Our approach expresses this form of abstrac- 
tion naturally: the store remains bounded, but continua- 
tions stay on the stack. 
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